Db2 Jupyter: Using Prepared Statements

Normal the %sql magic command is used to execute SQL commands immediately to get a result. If this statement needs to be executed multiple times with different variables, the process is inefficient since the SQL statement must be recompiled every time.

The use of the PREPARE and EXECUTE command allow the user to optimize the statement once, and then re-execute the statement using different parameters.

In addition, the commit scope can be modified so that not every statement gets committed immediately. By managing the commit scope, overhead in the database engine can be avoided.



In [ ]:

    
%run db2.ipynb

Autocommit and Commit Scope

By default, any SQL statements executed with the %sql magic command are immediately commited. This means that the log file has the transaction details and the results are committed to disk. In other words, you can't change your mind after the statement finishes execution.

This behavior is often referred to as AUTOCOMMIT and adds a level of overhead to statement execution because at the end of every statement the results must be "hardened". On the other hand, autocommit means you don't have to worry about explicitly committing work or causing potential locking issues because you are holding up resources. When a record is updated, no other user will be able to view it (unless using "dirty read") until you commit. Holding the resource in a lock means that other workloads may come to a halt while they wait for you to commit your work.

Here is a classic example of wanting a commit scope that is based on a series of statements:

withdrawal = 100
%sql update checking set balance = balance - withdrawal
%sql update savings set balance = balance + withdrawal

If autocommit is ON, you could have a problem with the transaction if the system failed after the first update statement. You would have taken money out of the checking account, but have not updated the savings account. To make sure that this transaction is run successfully:

%sql autocommit off
withdrawal = 100
%sql update checking set balance = balance - withdrawal
%sql update savings set balance = balance + withdrawal
%sql commit work

If the transaction fails before the COMMIT WORK, all changes to the database will be rolled back to its original state, thus protecting the integrity of the two tables.

AUTOCOMMIT

Autocommit can be turned on or off using the following syntax:

%sql AUTOCOMMIT ON | OFF

If you turn AUTOCOMMIT OFF then you need to make sure that you COMMIT work at the end of your code. If you don't there is possible you lose your work if the connection is lost to Db2.

COMMIT, ROLLBACK

To COMMIT all changes to the database you must use the following syntax:

%sql COMMIT [WORK | HOLD]

The command COMMIT or COMMIT WORK are identical and will commit all work to the database. Issuing a COMMIT command also closes all open cursors or statements that are open. If you had created a prepared statement (see section below) then the compiled statement will be no longer valid. By issuing a COMMIT you are releasing all of the resources and locks that your application may be holding.

COMMIT HOLD will allow you to commit your work to disk, but keeps all of the resources open for further execution. This is useful for situations where you are inserting or updating 1000's of records and do not want to tie up log space waiting for a commit to occur. The following pseudocode gives you an example how this would be used:

%sql autocommit off
for i = 1 to 1000
    %sql insert into x values i
    if (i / 100 == 0) 
       print i "Records inserted"
       %sql commit work
    end if 
end for
%sql commit work
%sql autocommit on

You should always remember to turn AUTOCOMMIT ON at the end of any code block or you will have to issue COMMIT at the end of any SQL command to commit it to the database.

PREPARE and EXECUTE

The PREPARE and EXECUTE commands are useful in situations where you want to repeat an SQL statement multiple times while just changing the parameter values. There isn't any benefit from using these statements for simple tasks that may only run occassionally. The benefit of PREPARE/EXECUTE is more evident when dealing with a large number of transactions that are the same.

The PREPARE statement can be used against many types of SQL, but in this implementation, only the following SQL statements are supported:

SELECT
INSERT
UPDATE
DELETE
MERGE

To prepare a statement, you must use the following syntax:

stmt = %sql PREPARE sql ....

The PREPARE statement always returns a statement handle. You must assign the results of the PREPARE statement to a variable since it will be required when you EXECUTE the statement.

The SQL statement must have any variables replaced with a question mark ?. For instance, if you wanted to insert a single value into a table you would use the following syntax:

stmt = %sql PREPARE insert into x values (?)

One important note with parameter markers. If you require the parameter to have a specific data type (say INTEGER) then you may want to place a CAST statement around it to force the proper conversion. Usually strings, integers, decimals, etc... convert fine when using this syntax, but occasionally you may run across a data type issue. For the previous example we could modify it to:

stmt = %sql PREPARE insert into x values (CAST(? AS INTEGER))

Once you have prepared a statement, you can execute it using the following syntax:

%sql EXECUTE :stmt USING :v1,:v2,:v3,....

You must provide the variable names with a colon : in front of them and separate each one with a comma. This allows the SQL parser to differentiate between a host variable and a column or SQL keyword. You can also use constants as part of the EXECUTE statement:

%sql EXECUTE :stmt USING 3,'asdsa',24.5

Using variables are more convenient when dealing with strings that may contain single and double quotes.

Using Arrays and Multiple Parameters

When using the PREPARE statement, it can become cumbersome when dealing with many parameter markers. For instance, in order to insert 10 columns into a table the code would look similar to this:

stmt = %sql PREPARE INSERT INTO X VALUES (?,?,?,?,?,?,?,?,?,?)

The %sql command allows you to use the shortform ?*# where # is an integer representing the number of columns you want in the list. The above statement could be written as:

stmt = %sql PREPARE INSERT INTO X VALUES (?*10)

The syntax can also be used to create groups of parameter markers:

stmt = %sql PREPARE INSERT INTO X VALUES (?*3,?*7)

While this may seem a strange way of providing parameters, this becomes more useful when we use the EXECUTE command.

The EXECUTE command can use Python arrays (lists) as input arguments. For the previous example with 10 parameters you could issue the following command:

%sql EXECUTE :stmt USING :v1,:v2,:v3,:v4,:v5,:v6,:v7,:v8,:v9,:v10

If you placed all of these values into an array, you could also do the following:

%sql EXECUTE :stmt USING :v[0],:v[1],:v[2],:v[3],:v[4],:v[5],:v[6],:v[7],:v[8],:v[9]

That isn't much simpler but shows that you could use items within an array (one dimensional only). The easiest syntax is:

%sql EXECUTE :stmt USING :v

The only requirement is that the array v has exactly the number of values required to satisfy the parameter list required by the prepared statement.

When you split the argument list into groups, you can use multiple arrays to contain the data. Given the following prepare statement:

stmt = %sql PREPARE INSERT INTO X VALUES (?*3,?*7)

You could execute the statement using two arrays:

%sql EXECUTE :stmt USING :name, :details

This would work as long as the total number of parameters supplied by the name array and details array is equal to 10.

Performance Comparisons

The following examples will show the use of AUTOCOMMIT and PREPARE/EXECUTE when running SQL statements.

This first SQL statement will load the EMPLOYEE and DEPARTMENT tables (if they don't already exist) and then return an array of all of the employees in the company using a SELECT statement.



In [ ]:

    
%sql -sampledata
employees = %sql -r select * from employee

The employees variable contains all of the employee data as a Python array. The next statement will retrieve the contents of the first row only (Remember that row 0 contains the name of the columns).



In [ ]:

    
print(employees[1])

We now will create another EMPLOYEE table that is an exact duplicate of what we already have.



In [ ]:

    
%%sql -q
DROP TABLE EMPLOYEE2;
CREATE TABLE EMPLOYEE2 AS (SELECT * FROM EMPLOYEE) DEFINITION ONLY;

Loop with INSERT Statements

One could always use SQL to insert into this table, but we will use a loop to execute insert statements. The loop will be timed so we can get a sense of the cost of running this code. In order to make the loop run a longer the insert block is run 50 times.



In [ ]:

    
%sql -q DELETE FROM EMPLOYEE2
print("Starting Insert")
start_time = time.time()
i = 0
for k in range(0,50):
    for record in employees[1:]:
        i += 1
        empno,firstnme,midinit,lastname,workdept,phoneno,hiredate,job,edlevel,sex,birthdate,salary,bonus,comm = record
        %sql -q INSERT INTO EMPLOYEE2 VALUES ( \
            :empno,:firstnme,:midinit, \
            :lastname,:workdept,:phoneno, \
            :hiredate,:job,:edlevel, \
            :sex,:birthdate,:salary, \
            :bonus,:comm) 
end_time = time.time()
print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))
time_insert = end_time-start_time

Loop with PREPARE Statement

An alternative method would be to use a prepared statement that allows us to compile the statement once in Db2 and then reuse the statement in Db2's memory. This method uses the individual column values as input into the EXECUTE statement.



In [ ]:

    
%sql -q DELETE FROM EMPLOYEE2
print("Starting Insert")
start_time = time.time()
i = 0
prep = %sql prepare INSERT INTO EMPLOYEE2 VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)
for k in range(0,50):
    for record in employees[1:]:
        i += 1
        empno,firstnme,midinit,lastname,workdept,phoneno,hiredate,job,edlevel,sex,birthdate,salary,bonus,comm = record
        %sql execute :prep using :empno,:firstnme,:midinit,:lastname,:workdept,:phoneno,:hiredate,:job,:edlevel,:sex,:birthdate,:salary,:bonus,:comm
end_time = time.time()
print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))
time_prepare = end_time-start_time

Loop with PREPARE Statement and Arrays

You will notice that it is a bit tedious to write out all of the columns that are required as part of an INSERT statement. A simpler option is to use a Python list or array to and assign it directly in the EXECUTE statement. So rather than:

%sql execute :prep using :empno, :firstnme, ...

We can just use the array variable generated as part of the for loop:

%sql execute :prep using :record

The following SQL demonstrates this approach.



In [ ]:

    
%sql -q DELETE FROM EMPLOYEE2
print("Starting Insert")
start_time = time.time()
i = 0
prep = %sql prepare INSERT INTO EMPLOYEE2 VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)
for k in range(0,50):
    for record in employees[1:]:
        i += 1
        %sql execute :prep using :record

end_time = time.time()
print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))
time_array = end_time-start_time

Loop with PREPARE Statement, Arrays and AUTOCOMMIT OFF

Finally, we can turn AUTOCOMMIT off and then commit the work at the end of the block to improve the total time required to insert the data. Note the use of the parameter shortform ?*14 in the code.



In [ ]:

    
%sql -q DELETE FROM EMPLOYEE2
%sql autocommit off
print("Starting Insert")
start_time = time.time()
i = 0
prep = %sql prepare INSERT INTO EMPLOYEE2 VALUES (?*14)
for k in range(0,50):
    for record in employees[1:]:
        i += 1
        %sql execute :prep using :record
%sql commit work                                           
end_time = time.time()
print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))
%sql autocommit on   
time_commit = end_time-start_time

Performance Comparison

You may have noticed that the performance of the last method is substantially faster than the other examples. The primary reason for this is the COMMIT only occuring at the end of the code.



In [ ]:

    
%%sql -pb
WITH RESULT(RUN,ELAPSED) AS (
  VALUES
    ('INSERT',CAST(:time_insert AS DEC(5,2))),
    ('PREPARE',CAST(:time_prepare AS DEC(5,2))),
    ('ARRAY ',CAST(:time_array AS DEC(5,2))),
    ('COMMIT ',CAST(:time_commit AS DEC(5,2)))   
)
SELECT RUN, ELAPSED FROM RESULT
  ORDER BY ELAPSED DESC